DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
نویسندگان
چکیده
Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence variants that likely arose during DNA amplification. The workflow remains flexible such that it may be customised to variants of the data production protocol used, and supports reproducible analysis through detailed logging and reporting of results. DeepSNVMiner is available for academic non-commercial research purposes at https://github.com/mattmattmattmatt/DeepSNVMiner.
منابع مشابه
Genetic Variation Among Salvia Species Based on Sequence-Related Amplified Polymorphism (SRAP) Marker
In this study, SRAP molecular maker approach was performed to investigate genetic diversity in the Salvia genus. A total of 205 DNA bands were produced from PCR amplification of 11 Salvia species and populations using 25 selective primer combinations, of which 204 polymorphic genetic loci accounted. The total number of amplified fragments ranged from 3 to 15. The genetic similarities of 11 coll...
متن کاملGenetic Analysis of D-Loop Region of Mitochondrial DNA Sequence in Iranian Patients with Familial Adenomatous Polyposis (FAP): A Case-Control Study
Background and Objectives: Familial adenomatous polyposis (FAP) is an inherited disorder and a rare form of colorectal cancer. This disease appears equally in both sexes and its occurrence is more in the second or third decade of life. Mutations and alterations of the mitochondrial genome, especially the D-loop region, have been reported in various human tumors. But the exact role of these muta...
متن کاملAbsence of FLT3 mutations in Iranian adult T-cell leukemia/lymphoma patients
Background: Adult T cell leukemia lymphoma (ATLL) is a rare disease, significantly linked to the infection by the human T-cell lymphotropic virus 1(HTLV-1). ATLL is typically preceded by decades of clinical latency during which infected cells accumulate selectable traits leading to a malignant transformation. Amongst all the HTLV-1 infected carriers only about 3-5% will develop ATLL. Despite th...
متن کاملComparison of Delta- PCR and Conventional Fragment Analysis for the Detection of FLT3-ITD Mutations in Paired Diagnosis-Relapse Samples of Patients with Acute Myeloid Leukemia
Background & Objective: FLT3-ITD mutation detection has been an integral part of diagnostic work ups focused on acute myeloid leukemia. However, some studies have indicated that the mutation is unstable during the various stages of the disease. The purpose of this study was to evaluate the stability of this marker in paired diagnosis-relapse samples using Delta-PCR method. Materials & Methods:...
متن کاملThalassemic Mutations in Southern Iran
Background: Approximately 180 mutations have been described in β-thalassemia worldwide with specific spectrum in each ethnic population. This study determines the spectrum and the frequency of β-thalassemia mutations in patients with β-thalassemia trait and sickle cell-β-thalassemia. Methods: Fifteen compound heterozygous sickle cell thalassemia (SCT) and 23 β-thalassemia trait patients were st...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 4 شماره
صفحات -
تاریخ انتشار 2016